HOPI: An Efficient Connection Index for Complex XML Document Collections
Identifieur interne : 000B06 ( Main/Exploration ); précédent : 000B05; suivant : 000B07HOPI: An Efficient Connection Index for Complex XML Document Collections
Auteurs : Ralf Schenkel [Allemagne] ; Anja Theobald [Allemagne] ; Gerhard Weikum [Allemagne]Source :
- Lecture Notes in Computer Science [ 0302-9743 ] ; 2004.
English descriptors
- Teeft :
- Algorithm, Ancestor queries, Arbitrary graphs, Center graph, Center graphs, Center node, Closure, Complete dblp, Compression factor, Compression ratio, Connection index, Cout, Data collections, Database, Dblp, Densest, Densest subgraph, Densest subgraphs, Descendants, Document, Document graph, Entire graph, Experiments show, Graph, Hopi, Hopi index, Index structure, Index structures, Large graphs, Lout, Megabyte, Node, Partitioning, Path expressions, Path queries, Query, Query performance, Schenkel, Search engine, Sigmod, Single document, Span documents, Subgraph, Subgraphs, Theobald, Transitive, Transitive closure, Tree signatures, Vout, Weikum, Wildcards.
Abstract
Abstract: In this paper we present HOPI, a new connection index for XML documents based on the concept of the 2–hop cover of a directed graph introduced by Cohen et al. In contrast to most of the prior work on XML indexing we consider not only paths with child or parent relationships between the nodes, but also provide space– and time–efficient reachability tests along the ancestor, descendant, and link axes to support path expressions with wildcards in our XXL search engine. We improve the theoretical concept of a 2–hop cover by developing scalable methods for index creation on very large XML data collections with long paths and extensive cross–linkage. Our experiments show substantial savings in the query performance of the HOPI index over previously proposed index structures in combination with low space requirements.
Url:
DOI: 10.1007/978-3-540-24741-8_15
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Istex, to step Corpus: 001910
- to stream Istex, to step Curation: 001794
- to stream Istex, to step Checkpoint: 000901
- to stream Main, to step Merge: 000B06
- to stream Main, to step Curation: 000B06
Le document en format XML
<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">HOPI: An Efficient Connection Index for Complex XML Document Collections</title>
<author><name sortKey="Schenkel, Ralf" sort="Schenkel, Ralf" uniqKey="Schenkel R" first="Ralf" last="Schenkel">Ralf Schenkel</name>
</author>
<author><name sortKey="Theobald, Anja" sort="Theobald, Anja" uniqKey="Theobald A" first="Anja" last="Theobald">Anja Theobald</name>
</author>
<author><name sortKey="Weikum, Gerhard" sort="Weikum, Gerhard" uniqKey="Weikum G" first="Gerhard" last="Weikum">Gerhard Weikum</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:EF8BDDE90350CC3E6406E65AC136E9959B0FA3E9</idno>
<date when="2004" year="2004">2004</date>
<idno type="doi">10.1007/978-3-540-24741-8_15</idno>
<idno type="url">https://api.istex.fr/document/EF8BDDE90350CC3E6406E65AC136E9959B0FA3E9/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">001910</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Corpus" wicri:corpus="ISTEX">001910</idno>
<idno type="wicri:Area/Istex/Curation">001794</idno>
<idno type="wicri:Area/Istex/Checkpoint">000901</idno>
<idno type="wicri:explorRef" wicri:stream="Istex" wicri:step="Checkpoint">000901</idno>
<idno type="wicri:doubleKey">0302-9743:2004:Schenkel R:hopi:an:efficient</idno>
<idno type="wicri:Area/Main/Merge">000B06</idno>
<idno type="wicri:Area/Main/Curation">000B06</idno>
<idno type="wicri:Area/Main/Exploration">000B06</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">HOPI: An Efficient Connection Index for Complex XML Document Collections</title>
<author><name sortKey="Schenkel, Ralf" sort="Schenkel, Ralf" uniqKey="Schenkel R" first="Ralf" last="Schenkel">Ralf Schenkel</name>
<affiliation wicri:level="3"><country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Max Planck Institut für Informatik, Saarbrücken</wicri:regionArea>
<placeName><region type="land" nuts="2">Sarre (Land)</region>
<settlement type="city">Sarrebruck</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Allemagne</country>
</affiliation>
</author>
<author><name sortKey="Theobald, Anja" sort="Theobald, Anja" uniqKey="Theobald A" first="Anja" last="Theobald">Anja Theobald</name>
<affiliation wicri:level="3"><country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Max Planck Institut für Informatik, Saarbrücken</wicri:regionArea>
<placeName><region type="land" nuts="2">Sarre (Land)</region>
<settlement type="city">Sarrebruck</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Allemagne</country>
</affiliation>
</author>
<author><name sortKey="Weikum, Gerhard" sort="Weikum, Gerhard" uniqKey="Weikum G" first="Gerhard" last="Weikum">Gerhard Weikum</name>
<affiliation wicri:level="3"><country xml:lang="fr">Allemagne</country>
<wicri:regionArea>Max Planck Institut für Informatik, Saarbrücken</wicri:regionArea>
<placeName><region type="land" nuts="2">Sarre (Land)</region>
<settlement type="city">Sarrebruck</settlement>
</placeName>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Allemagne</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>2004</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="Teeft" xml:lang="en"><term>Algorithm</term>
<term>Ancestor queries</term>
<term>Arbitrary graphs</term>
<term>Center graph</term>
<term>Center graphs</term>
<term>Center node</term>
<term>Closure</term>
<term>Complete dblp</term>
<term>Compression factor</term>
<term>Compression ratio</term>
<term>Connection index</term>
<term>Cout</term>
<term>Data collections</term>
<term>Database</term>
<term>Dblp</term>
<term>Densest</term>
<term>Densest subgraph</term>
<term>Densest subgraphs</term>
<term>Descendants</term>
<term>Document</term>
<term>Document graph</term>
<term>Entire graph</term>
<term>Experiments show</term>
<term>Graph</term>
<term>Hopi</term>
<term>Hopi index</term>
<term>Index structure</term>
<term>Index structures</term>
<term>Large graphs</term>
<term>Lout</term>
<term>Megabyte</term>
<term>Node</term>
<term>Partitioning</term>
<term>Path expressions</term>
<term>Path queries</term>
<term>Query</term>
<term>Query performance</term>
<term>Schenkel</term>
<term>Search engine</term>
<term>Sigmod</term>
<term>Single document</term>
<term>Span documents</term>
<term>Subgraph</term>
<term>Subgraphs</term>
<term>Theobald</term>
<term>Transitive</term>
<term>Transitive closure</term>
<term>Tree signatures</term>
<term>Vout</term>
<term>Weikum</term>
<term>Wildcards</term>
</keywords>
</textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: In this paper we present HOPI, a new connection index for XML documents based on the concept of the 2–hop cover of a directed graph introduced by Cohen et al. In contrast to most of the prior work on XML indexing we consider not only paths with child or parent relationships between the nodes, but also provide space– and time–efficient reachability tests along the ancestor, descendant, and link axes to support path expressions with wildcards in our XXL search engine. We improve the theoretical concept of a 2–hop cover by developing scalable methods for index creation on very large XML data collections with long paths and extensive cross–linkage. Our experiments show substantial savings in the query performance of the HOPI index over previously proposed index structures in combination with low space requirements.</div>
</front>
</TEI>
<affiliations><list><country><li>Allemagne</li>
</country>
<region><li>Sarre (Land)</li>
</region>
<settlement><li>Sarrebruck</li>
</settlement>
</list>
<tree><country name="Allemagne"><region name="Sarre (Land)"><name sortKey="Schenkel, Ralf" sort="Schenkel, Ralf" uniqKey="Schenkel R" first="Ralf" last="Schenkel">Ralf Schenkel</name>
</region>
<name sortKey="Schenkel, Ralf" sort="Schenkel, Ralf" uniqKey="Schenkel R" first="Ralf" last="Schenkel">Ralf Schenkel</name>
<name sortKey="Theobald, Anja" sort="Theobald, Anja" uniqKey="Theobald A" first="Anja" last="Theobald">Anja Theobald</name>
<name sortKey="Theobald, Anja" sort="Theobald, Anja" uniqKey="Theobald A" first="Anja" last="Theobald">Anja Theobald</name>
<name sortKey="Weikum, Gerhard" sort="Weikum, Gerhard" uniqKey="Weikum G" first="Gerhard" last="Weikum">Gerhard Weikum</name>
<name sortKey="Weikum, Gerhard" sort="Weikum, Gerhard" uniqKey="Weikum G" first="Gerhard" last="Weikum">Gerhard Weikum</name>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Wicri/Sarre/explor/MusicSarreV3/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000B06 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000B06 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Wicri/Sarre |area= MusicSarreV3 |flux= Main |étape= Exploration |type= RBID |clé= ISTEX:EF8BDDE90350CC3E6406E65AC136E9959B0FA3E9 |texte= HOPI: An Efficient Connection Index for Complex XML Document Collections }}
This area was generated with Dilib version V0.6.33. |